Yue XIE Ruiyu LIANG Zhenlin LIANG Xiaoyan ZHAO Wenhao ZENG
To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.
For massive multiple-input multiple-output (MIMO) communication systems, simple linear detectors such as zero forcing (ZF) and minimum mean square error (MMSE) can achieve near-optimal detection performance with reduced computational complexity. However, such linear detectors always involve complicated matrix inversion, which will suffer from high computational overhead in the practical implementation. Due to the massive parallel-processing and efficient hardware-implementation nature, the neural network has become a promising approach to signal processing for the future wireless communications. In this paper, we first propose an efficient neural network to calculate the pseudo-inverses for any type of matrices based on the improved Newton's method, termed as the PINN. Through detailed analysis and derivation, the linear massive MIMO detectors are mapped on PINNs, which can take full advantage of the research achievements of neural networks in both algorithms and hardwares. Furthermore, an improved limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) quasi-Newton method is studied as the learning algorithm of PINNs to achieve a better performance/complexity trade-off. Simulation results finally validate the efficiency of the proposed scheme.
Ruilin LI Bing SUN Chao LI Shaojing FU
T-function is a kind of cryptographic function which is shown to be useful in various applications. It is known that any function f on F2n or Z2n automatically deduces a unique polynomial fF ∈ F2n[x] with degree ≤ 2n-1. In this letter, we study an algebraic property of fF while f is a T-function. We prove that for a single cycle T-function f on F2n or Z2n, deg fF=2n-2 which is optimal for a permutation. We also consider a kind of widely used T-function in many cryptographic algorithms, namely the modular addition function Ab(x)=x+b ∈ Z2n[x]. We demonstrate how to calculate deg Ab F from the constant value b. These results can facilitate us to evaluate the immunity of the T-function based cryptosystem against some known attacks such as interpolation attack and integral attack.
Jianguo TAN Wenjun ZHANG Peilin LIU
Sinusoidal representation has been widely applied to speech modification, low bit rate speech and audio coding. Usually, speech signal is analyzed and synthesized using the overlap-add algorithm or the peak-picking algorithm. But the overlap-add algorithm is well known for high computational complexity and the peak-picking algorithm cannot track the transient and syllabic variation well. In this letter, both algorithms are applied to speech analysis/synthesis. Peaks are picked in the curve of power spectral density for speech signal; the frequencies corresponding to these peaks are arranged according to the descending orders of their corresponding power spectral densities. These frequencies are regarded as the candidate frequencies to determine the corresponding amplitudes and initial phases according to the least mean square error criterion. The summation of the extracted sinusoidal components is used to successively approach the original speech signal. The results show that the proposed algorithm can track the transient and syllabic variation and can attain the good synthesized speech signal with low computational complexity.
This paper proposes a scheme for reducing pilot interference in cell-free massive multiple-input multiple-output (MIMO) systems through scalable access point (AP) selection and efficient pilot allocation using the Grey Wolf Optimizer (GWO). Specifically, we introduce a bidirectional large-scale fading-based (B-LSFB) AP selection method that builds high-quality connections benefiting both APs and UEs. Then, we limit the number of UEs that each AP can serve and encourage competition among UEs to improve the scalability of this approach. Additionally, we propose a grey wolf optimization based pilot allocation (GWOPA) scheme to minimize pilot contamination. Specifically, we first define a fitness function to quantify the level of pilot interference between UEs, and then construct dynamic interference relationships between any UE and its serving AP sets using a weighted fitness function to minimize pilot interference. The simulation results shows that the B-LSFB strategy achieves scalability with performance similar to large-scale fading-based (LSFB) AP selection. Furthermore, the grey wolf optimization-based pilot allocation scheme significantly improves per-user net throughput with low complexity compared to four existing schemes.